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ABSTRACT 

Single-stranded DNA (ssDNA) binding proteins are 
important in basal metabolic pathways for gene tran- 
scription, recombination, DNA repair and replication 
in all domains of life. Their main cellular role is to 
stabilize melted duplex DNA and protect genomic 
DNA from degradation. We have uncovered the mo- 
lecular function of protein domain family domain of 
unknown function DUF2128 (PF09901) as a novel 
ssDNA binding domain. This bacterial domain 
strongly associates into a dimer and presents a 
highly positively charged surface that is consistent 
with its function in non-specific ssDNA binding. 
Lactococcus lactis YdbC is a representative of 
DUF2128. The solution NMR structures of the 
20kDa apo-YdbC dimer and YdbCidT^GT complex 
were determined. The ssDNA-binding energetics to 
YdbC were characterized by isothermal titration cal- 
orimetry. YdbC shows comparable nanomolar 
affinities for pyrimidine and mixed oligonucleotides, 
and the affinity is sufficiently strong to disrupt duplex 
DNA. In addition, YdbC binds with lower affinity to 
ssRNA, making it a versatile nucleic acid-binding 
domain. The DUF2128 family is related to the eukary- 
otic nuclear protein positive cofactor 4 (PC4) family 



and to the PUR family both by fold similarity and mo- 
lecular function. 

INTRODUCTION 

Single-stranded DNA (ssDNA) binding proteins, termed 
SSBs, are ubiquitous in nature and are essential in tran- 
scription, repair and recombination metabolism (1). SSBs 
interact strongly and non-specifically with unwound 
DNA, thereby preventing the formation of secondary 
structure elements and its degradation by nucleases. In 
Escherichia coli, SSBs play an integral role as genome 
maintenance agents that initiate and stimulate the DNA 
repair machinery. The oligosaccharide/oligonucleotide- 
binding domain (OB) fold is the recognized structural sig- 
nature of SSBs in eubacteria. 

Single-stranded-binding domains that deviate from the 
canonical OB fold were identified more recently. Among 
these domains are the positive cofactor 4 (PC4)/Subl (2), 
the PUR-a (3) and Deinococcus radiodurans DdrB (4). The 
PC4 domain binds non-specifically ssDNA as dimers, 
whereas PUR (purine-rich binding) domains preferentially 
bind purine-rich (NGG) n ssDNA and RNA repeats (5). 
DdrB is an SSB with a novel fold and is key to 
D. radiodurans resistance to ionizing radiation damage 
(6). The PC4 domain was thought to be unique in the eu- 
karyotic domain (7), whereas the PUR superfamily was 
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shown to have representatives in both the eukaryotic and 
prokaryotic kingdoms (8). These multifunctional domains 
play a number of distinct roles as transcription co- 
regulators by interacting with basal factors, in mRNA 
transport and in DNA repair pathways (3). PC4 has 
shown disparate functions, acting as both a co-activator 
of transcription factor-mediated RNA Pol-II transcription 
(9) and as a repressor of Pol-II-mediated transcription by 
preventing its phosphorylation (10). Although their affinity 
to double stranded DNA (dsDNA) may only be sufficient 
to weaken the helix (11), the domains have the ability to 
sequester ssDNA while sliding or translocating freely along 
the chain (12). 

Before this work, protein domain family domain of 
unknown function DUF2128 (PF09901) was a family of 
functionally uncharacterized proteins found exclusively in 
prokaryotes (13). The domain family was targeted for 
structural studies by the Protein Structure Initiative (14) 
as part of a broad effort in structural coverage of proteins 
identified in the human gut metagenomic sequencing 
projects (15). The sequence homology of this domain 
family was too low to be matched with sufficient 
accuracy to any other known superfamily, but clues to 
its biochemical function could be gleaned from the know- 
ledge of its structure. The 72-residue (8.40 kDa) YdbC 
protein from Lactococcus lactis is a representative 
member of this protein domain family. Because of its 
use in dairy fermentations and its GRAS (generally 
regarded as safe) status, L. lactis is an important industrial 
microorganism. Its uses are increasingly expanding to ap- 
plications in medicine, including the delivery of recombin- 
ant proteins to humans (16). Many features in the 
proteome of this important microorganism remain to be 
uncovered. Here, we present solution nuclear magnetic 
resonance (NMR) structural and ssDNA binding studies 
of L. lactis YdbC. The protein exhibits unexpectedly high- 
structural similarity to the symmetric homodimer struc- 
tures of PC4 and PUR-a eukaryotic ssDNA-binding 
domains, suggesting a potential ssDNA binding function 
for this protein. We demonstrate that L. lactis YdbC 
forms a tight complex with ssDNA, adopting a structure 
that closely resembles that of PC4 and characterize the 
binding energetics by microcalorimetry. Moreover, we 
show that YdbC can partially disrupt a 26-base DNA 
duplex sequestering the resulting single strands and is 
capable to bind weakly to ssRNA. Using structure-based 
sequence and phylogenetic analyses, we place the 
DUF2128 protein domain family in its proper evolution- 
ary context and merge the DUF2128 and the PC4 domain 
into the same superfamily. 

MATERIALS AND METHODS 

Sample preparation 

The full-length YdbC protein from L. lactis, including a 
C-terminal His 6 tag (LEHHHHHH), was cloned, expressed 
and purified following standard protocols in the literature 
to prepare [[/- 13 C, 15 N]- and [£/-5%- 13 C,100%- 15 N]-YdbC 
samples for NMR spectroscopy (17). Detailed descriptions 
of sample preparation and results of biophysical 



characterization, including analytical gel filtration, analyt- 
ical ultracentrifugation, isothermal titration calorimetry 
(ITC) and NMR T\jT 2 measurements can be found in 
Supplementary Methods and Supplementary Figure 
S1-S4. Protocols for the preparation of YdbC:ssDNA, 
YdbCdsDNA and ssRNA samples are also detailed in 
the Supplementary Methods. This expression vector is 
available as KR150.21.1 from the Protein Structure 
Initiative Materials Repository (http://psimr.asu.edu/). 

Structure determination and analysis 

The solution NMR structures of apo-YdbC and 
YdbC:dTi 9 Gi complex were calculated using NOESY 
data collected under identical conditions and parameters. 
NMR protocols are detailed in the Supplementary 
Methods section. Initial apo-YdbC structures were 
calculated with CYANA 3.0 (18) using resonance assign- 
ments, NOESY peak lists from 3D 13 C-edited, 
15 N-NOESY and Fl- 13 C/ 15 N-filtered, F3- 13 C-edited 
NOESY spectra, dihedral restraints derived from 
TALOS+ (19) and two sets of 'H- 15 N residual dipolar 
couplings (RDCs). Symmetry identity dihedral and 
distance restraints were imposed between the two 
protomers to calculate 100 initial structures within 
CYANA 3.0. The final 20 structures with the lowest 
target functions were, subsequently, refined by restrained 
molecular dynamics (rMD) in explicit water, 
non-crystallographic symmetry and the PARAM19 par- 
ameters using CNS 1.3 (20,21). Identical protocol was 
followed for initial YdbC:dT i9 Gi structures calculations. 
The structure was computed with the knowledge that a 
single species in solution must include symmetric protein 
dimer and symmetric ssDNA units bound to each YdbC 
protomer. Symmetry was enforced both during initial 
CYANA calculations and later during energy refinement 
in explicit water bath. The program was supplied with the 
new chemical shifts (CS) resonance list, including ambigu- 
ous resonance assignments for thymidine, NOESY peak 
lists 13 C/ 15 N-edited 3D NOESY, 2D 'H-'H NOESY and 
3D Fl- 13 C/ 15 N-filtered, F3- 13 C/ 15 N-edited NOESY 
spectra and the revised TALOS+ dihedral restraints set 
for the complex. Symmetry identity dihedral and 
distance restraints were imposed between the two 
protomers and between the two dT chains. The 'KEEP' 
sub-routine was used in CYANA 3.0 to enforce the 
manually assigned protein:dT X-filtered peaks. The best 
20 structures from the final cycle were then refined by 
rMD in a water bath, non-crystallographic symmetry 
and C2 symmetry and OPLSX parameters using the 
HADDOCK web server (22). For both the apo- and 
ssDNA-bound YdbC structure refinements, experimental 
restraints (nuclear Overhauser effect (NOE)-derived 
distance, dihedral and empirical hydrogen bond) were 
used in the final rMD calculations. Structural statistics 
and global structure quality scores for apo- YdbC and 
YdbC:dT 19 Gi were computed using the PSVS 1.4 
software package (23). The global RDC statistics for 
apo-YdbC were computed using PALES (24). Single- 
stranded DNA geometry was analysed with the program 
3DNA (25). The final coordinates (excluding the 
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C-terminal hexa-His polypeptide segment) for the 
ensemble of 20 structures and NMR-derived restraints 
for both apo- and holo-YdbC were deposited to the 
Protein Data Bank (PDB) with IDs 21td and 21tt, respect- 
ively. The CS assignments were deposited to the Biological 
Magnetic Resonance Data Bank with entries 18 469 and 
18 496, respectively. Pairwise structure-based sequence 
alignments and coordinate superimpositions were 
obtained from the jCE server (26,27). 3D protein structure 
comparison of the apo-YdbC structure with structures in 
the Protein Data Bank was conducted using the DaliLite 
server (28). Conserved residue analysis was performed 
using the ConSurf server (29,30) using full-length se- 
quences from the entire PF09901 (DUF2128) protein 
domain family (Pfam 26.0; 414 sequences) re-aligned 
with the ClustalW 2.0 server (31). Electrostatic surface 
potentials were computed for the first (lowest energy) 
model of the apo-YdbC ensemble using the APBS 
version 1.2.1 software package (32) and PDB2PQR 
version 1.6 server (33). Structure figures were made 
using PyMOL version 1.4 (www.pymol.org). 

Isothermal titration calorimetry 

ITC measurements were conducted at 25° C on an iTC 2 oo 
microcalorimeter (MicroCal Inc., Northampton, MA, 
USA). All ITC measurements were performed in 10 mM 
of Tris buffer at pH 7.5 containing either 0, 50, 150 or 
300 mM of NaCl. In each experiment, aliquots of a 
220-|.iM solution of YdbC were sequentially injected 
from a 40-ul rotating syringe (1000 r. p.m.) into an isother- 
mal sample chamber containing 210 ul of 8 \iM of an 
ssDNA oligonucleotide either dT 19 Gi, dC 2 o, dA 2 o (The 
Midland Certified Reagent Company) or d(A-C)i 0 
(Integrated DNA Technologies). In each experiment, the 
initial injection was 0.4 ul and 0.8 s in duration, whereas 
the remaining 19 injections were 2|il and 4 s in duration 
with a 180 s delay between each injection. Each titration 
experiment was accompanied by the corresponding 
control experiment, in which YdbC was injected into a 
solution of buffer alone. Each injection generated a heat 
burst curve (ucal/s versus s), the area under which was 
determined by integration [using Origin version 7.0 
software (MicroCal Inc., Northampton, MA, USA)], to 
obtain a measure of the heat associated with that injec- 
tion. The measure of the heat associated with each 
YdbC-buffer injection, as estimated using a linear regres- 
sion analysis of the integrated data, was subtracted from 
that of the corresponding heat associated with each 
YdbC-ssDNA injection to yield the heat of ssDNA 
binding for that injection. After removal of the point cor- 
responding to the first low volume injection, the 
buffer-corrected ITC profiles for the binding of each 
YdbC-ssDNA experiment were fit models for either one 
set or two sets of binding sites. 

Sequence analysis 

Representative homologues of the L. lactis subspecies 
lactis sequence YdbC (ID 15672295), of the Homo 
sapiens PC4 (ID 62088150), and of the Borrelia 
burgdorferi PUR-a (ID 308198561) were selected in 



diverse taxonomic groups. BLASTP (34) was used to 
identify and retrieve these sequence homologues in 
genome and protein databases at NCBI (35). Further- 
more, bacterial homologues of PC4 were identified with 
Protein Structure Initiative (PSI)-Basic Local Alignment 
Search Tool (BLAST). Sequences within each family were 
first aligned using Clustal W (36). Because of the low 
sequence similarity between the three families, these 
three alignments were manually aligned with BioEdit 
version 7.1 (Ibis Biosciences) on the basis of their struc- 
tural similarity derived from jCE server (26,27). Sequence 
analysis was based on partial protein sequences encom- 
passing the full-length DUF2128 domain and correspond- 
ing regions in PC4 and PUR-a sequences. Sixty-two 
positions were included in the analysis. Programs of the 
PHYLIP package (37) were used for tree construction. 
The final alignment was re-sampled 100 times with 
Seqboot (37). A matrix of distances was obtained with 
Protdist (37), and used for tree construction with the 
neighbour-joining program Neighbor (37), and a consen- 
sus tree was derived using the program Consense (37). 

RESULTS 

Apo-YdbC 

The structure of L. lactis YdbC adopts the dimeric PC4 
fold as presented in the stereoview in Figure 1A. 
Secondary structure elements are as follows: 7-19 ((31, 
pi'), 22-32 ((32, P2'), 37-44 (P3, p3'), 51-57 (p4, p4'), 
59-72 (ocl, al'). Each 72-residue protomer has a concave 
four-stranded antiparallel sheet followed by a C-terminal 
helix. Helices (ocl, ocl') and strands (p4, p4') from each 
subunit form the main dimer interface, which has a buried 
surface area of ~2000 A 2 . Structure statistics for apo- YdbC 
are listed in Table 1; the assignment and NOE maps are 
shown in Supplementary Figure S5; and the structure 
ensemble is shown in Supplementary Figure S6. 

ConSurf (29,30) analysis of the DUF2128 sequences for 
the entire protein domain family is mapped onto the struc- 
ture (Figure IB) and YdbC sequence of L. lactis YdbC 
(Figure 2A). Conserved residues occur both in the centre 
of the concave P-sheet scaffold with side-chains extending 
into the concave side and in the P-strand that is part of the 
dimer interface (Figure IB). Conservation within the 
DUF2128 is especially strong in the P3 (Asp40, Arg42 
and Trp44) and P4 (Met51, Lys53, Gly54 and Thr56) 
strands. Within helix al, conservation is limited to 
Glu61 and Leu65, which maybe key to fold stability. 
Several conserved positively charged residues are 
involved in ssDNA binding as discussed below. 
Clustering of basic residues Lys4, 6, 21, 50 and 53 and 
Arg42 bias the electrostatic distribution and produce 
strong, uniform positive charge on one face of the 
molecule (Figure 1C and Supplementary Figure S7). 
PC4-like fold and charge characteristics provide the first 
evidence for the function of YdbC as a nucleic acid- 
binding protein. The sequence identity determined by 
structure-based alignment (DALI or jCE) to the PC4 
and PUR-a domains was found to be 15.3 and 11.8%, 
respectively (Figure 2B), and the corresponding Ca root- 
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Variable Average Conserved -4.5 kT/e +4.5 



Figure 1. Solution NMR structure of L. lactis apo-YdbC shown in the 
identical top-view orientation. (A) Stereoview of dimeric YdbC with 
labelled secondary structure elements and amino termini. (B) ConSurf 
(29,30) amino acid conservation mapped onto the lowest energy NMR 
structure. Highly conserved residues are labelled on the protein 
backbone of a single protomer. (C) Solvent exposed electrostatic po- 
tential (32) mapped onto the surface of apo-YdbC. Only the 
ssDNA-binding epitope is shown for clarity. 



mean-square-deviation (RMSD) was found to be 2.6 and 
4.0 A. Significant residue conservation in the ssDNA- 
binding site, particularly on P3, and (34 was also found 
between YdbC and PC4, whereas between YdbC and 
PUR-a conservation is remote. 

YdbC:dTi 9 Gi complex 

Strong backbone and side-chain chemical shift perturb- 
ations (CSPs) are observed on YdbC as a result of 
ssDNA binding. A 'H- 15 N heteronuclear single quantum 
coherence (HSQC) comparison of apo versus complex 
YdbC (Supplementary Figure S8) shows large variations 
in amide chemical shifts on binding, typical of slow 
exchange on the chemical-shift timescale and consistent 
with the nanomolar affinity of YdbC for poly-dT at 
low-salt NMR buffer conditions. Similar strong perturb- 
ations are visible in the 'H- 13 C HSQC for the YdbC 
residues both at the protein:protein and protein:DNA 
interface (i.e. Leu5 and others; data not shown). Full 
backbone CS perturbations for apo versus complex 
YdbC were computed (41) and mapped onto the 
apo- YdbC structure (Figure 3A). The strongest 
backbone CS differences are localized in the N-terminal 
region (residues 5-7) and at the dimer interface in (34 
(residues 53-58). In addition, {'H}- 15 N heteronuclear 
NOEs (hetNOE) were measured for both apo- and 
dT 19 Gi -bound YdbC (Supplementary Figure S9) and 
their difference (AhefNOE) is mapped onto the 



apo- YdbC structure (Figure 3B). To first approximation, 
the average increase in { 1 H}- 15 N hetNOE ratio (average 
~0.07) effect of complex versus apo indicates an overall 
increase in structural ordering on poly-dT binding. 
Ordering on poly-dT binding is predominant in the 
N-terminal region (residues 4-6) and, in addition, in the 
(32-(33 loop (residues 35-36) as discussed later in the text. 
We predict that these findings would be general for a 
variety of ssDNA sequences that bind with affinity 
similar to that of poly-dT as measured by ITC. CS assign- 
ment strategy and findings of bound-dT 19 Gi are described 
in Supplementary Figure S10 and SI 1. 

The complex structure is shown in Figure 4A, a top and 
side view of the complex assembly, Figure 4B and C show 
the numbering of the two symmetric poly-dT segments. 
Structural statistics for the protein-ssDNA complex are 
reported in Table 1, and a view of the final ensemble is 
shown in Supplementary Figure SI 2. CS averaging and 
degeneracy impede the structural characterization of the 
ssDNA loop and terminal regions and the identification of 
position-specific protein:ssDNA contacts. Site-specific 
protein to ssDNA contacts are shown in Figure 4D. 
YdbC to poly-dT hydrogen bond interactions, that were 
identified in the NOE assignment protocol, are indicated 
with dashed lines. Seven YdbC:dT interaction sites were 
identified. The protein:ssDNA interactions that are fully 
supported by NMR data include (i) strong aromatic 
stacking interactions between Trp23:T2 and Trp32:T5; 
and (ii) hydrophobic contacts Leu5(H51,2):T4-T5 
Phe7(H5,s):T4, Ala20(H(3):Tl, Ala35(H(3):T6, 
Thr43(Hy2):T2, Met51(Hs):T4 and Thr56(Hy2):T7. 
Strongly conserved Asp40(Oy):T5 and Arg42(Hs, 
Hq):T4,T5 contacts form key side-chain to base 
hydrogen bond interactions in the core site of the 
complex. Lys21, Asn33, Lys50, Lys53 and Glu61 are 
active participants in complex formation via hydrogen 
bonding and/or hydrophobic side-chain stacking to dT. 
Cross-peaks between HN and Hp, y, §, s of these 
residues and the dT HI', H7 and H6 are identified in the 
X-filtered NOESY spectrum. The protein to ssDNA 
surface contact area is ~4200A 2 . Single-strand DNA 
(dTigGj) dihedral angles and sugar angles and puckering 
conformations are listed with the usual numbering con- 
vention (T1-T6 and Tl'-T6') in Supplementary Table SI 
and S2, respectively. The bases were found to be in the 
'anti' conformation for the / torsion angle with the excep- 
tion of T6 (T6') and 'endo' sugar ring puckering except for 
T3 (T3')- The base-to-protein contacts are mapped as 
schematic view in Supplementary Figure SI 3. 

The structures of YdbC apo and complex were 
superposed using the combinatorial extension (CE) algo- 
rithm (27) in PyMol as shown in Supplementary Figure 
S14A. Changes in the P4 secondary structure length are 
apparent together with difference in the P3-P4 loop orien- 
tation and the pi positioning. Overall, the p structure, 
more concave in the apo form becomes slightly more 
open in the complex, and similarly to PC4 (42), the 
N-terminus becomes highly ordered in the complex. 
YdbC retains structural similarity to human PC4 [PDB 
ID: lpcf (apo) or 2c62 (complex)] (7) as clearly seen 
in Supplementary Figure S14B, but with a higher 
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Table 1. Summary of NMR Structural Statistics for apo-YdbC and YdbC:dT 19 G, ensembles" 

Data Type apo holo 

Completeness of resonance assignments 11 

Backbone (%) 97 91 

Side-chain (%) 93 93 

Aromatic (%) 100 100 

Stereospecific methyl (%) 100 100 

Conformational^ restricting restraints 0 
NOE restraints 

Total 3142 2258 

Intra-residue (/ = j) 673 362 

Sequential (| / -/I = 1) 819 562 

Medium range (1 < \i -j\ < 5) 480 288 

Long range (| i-j\> 5) 1170 1046 

NOE restraints/residue 42 14 

Interchain protein/protein NOEs 244 184 

Interchain protein/ssDNA NOEs 254 

Dihedral angle restraints 330 446 

Hydrogen bond restraints 128 120 

NH RDC restraints (polyethylene glycol (PEG)+phage) 222 

Number of restraints/residue (total/long range) 48/16.6 17.4/6.5 
Residual constraint violations 0 

Average distance restraint violations/structure 

0.1-0.2A 19.8 30.9 

0.2-0.5 A 3.6 11.7 

>0.5A 0.0 0.3 

Average RMS of distance violation/restraint (A) 0.02 0.03 

Maximum distance violation (A) 0.45 0.61 

Average RMS dihedral angle violations/structure 

>1°-10° 18.1 40.1 

>10° 0.4 1.35 

Average RMS dihedral angle violation/restraint 1.0 1.1 

Maximum dihedral angle violation (°) 11.0 20.5 
Model quality 0 

RMSD from average coordinates (A) 

All backbone atoms (ordered/all) 0.6/1.7 1.0/1.4 

All heavy atoms (ordered/all) 0.9/2.3 0.4/0.4 

RMSD bond lengths (A) 0.018 0.004 

RMSD bond angles (°) 1.3 0.7 

Molprobity Ramachandran plot d 

Most favoured regions (%) 95.7 90.9 

Additionally allowed regions (%) 4.2 9.1 

Disallowed regions (%) 0.1 0.0 

Global quality scores (Raw/Z-score) c 

Procheck G-factor (cb,\|/) rf -0.47/- 1.53 -0.55/- 1.85 

Procheck G-factor (all dihedrals) d -0.18/-1.06 -0.43/-2.54 

Verify3D 0.38/- 1.28 0.39/- 1.12 

Prosall 0.40/- 1.03 0.53/-0.50 

MolProbity clashscore 14.52/-0.97 21.6/— 2.18 
RPF scores 6 

Recall/Precison 86.8/89.1 

F measure/DP score 87.9/71.8 
Residual Dipolar Couplings (RDC) Scores r 

Q-factor (PEG/phage) 0.20/0.18 

R (PEG/phage) 0.97/0.98 



"Structural statistics were computed for the ensembles of 20 deposited structures (PDB ID: 21td and 21tt) using PSVS 
(23). 

Computed for residues 1-74. Resonances that were not included were exchangeable protons (N-terminal NH 3 + , Lys 
NH 3 + , Arg NH 2 , Cys SH, Ser/Thr/Tyr OH) and Pro N, C-terminal carbonyl, side-chain carbonyl and non-protonated 
aromatic carbons. 

"Average distance constraints were calculated using the sum of r~ 6 . 

d Ordered residue ranges [S(4>) + 5(v|/) > 1.8]:3— 74 (chain A), 3-74 (chain B). Secondary structure elements APO: 7-19 
(PI, pi'), 22-32 (P2, p2'), 37^4 (03, 03'), 51-57 (04, 04'), 59-72 (oil, al'). Secondary structure elements HOLO: 7-17 
(01, 01'), 24-32 (02, 02'), 36-44 (03, 03'), 55-57 (04, 04'), 59-72 (al, al'). 

e RPF scores (38) reflecting the goodness-of-fit of the final ensemble of structures (including disordered residues) to the 
NOESY data and resonance assignments. 
'Residual dipolar coupling quality scores (24). 
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Figure 2. (A) Structure-based sequence alignment (26,27) of L. lactis YdbC (DUF2128; PF09901), H. sapiens PC4 (PF02229) and B. burgdorferi 
PUR-a (DUF3276; PF 11680). (Top) Sequence alignment rendered by ESPript (42) using default parameters for residue similarity calculations, where 
boxed residues represent identical (red box, white character) and similar (red character) amino acid conservation. (Bottom) Sequence alignment 
rendered using ConSurf (29,30) where residue conservation across individual protein domain families range from highly conserved (magenta) to 
variable (cyan). (B) Comparison of the solution NMR structure of L. lactis YdbC with crystal structures structurally similar apo-forms of dimeric 
ssDNA-binding proteins, H. sapiens PC4 (PDB ID: lpcf) (43) and B. burgdorferi PUR-a (PDB ID: 3nm7) (8). 
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Figure 3. NMR characterization of poly-dT binding to L. lactis YdbC. (A) CSPs (A8 comp ) histogram. The bottom panel shows colour-coded residues 
defined according to the magnitude of the deviation from the mean CSP (green dotted line); yellow dotted line: mean + la; red dotted line: 
mean + 2a. The CSPs are mapped onto the apo-YdbC structure in tube representation. (B) {'H}- 15 N heteronuclear NOE difference (AhetNOE) 
between ssDNA-bound and apo-YdbC. The histogram (bottom panel) shows colour-coded residues defined according to magnitude of the deviation 
from the mean AhetNOE (cyan dotted line); purple dotted line: mean + la; magenta dotted line: mean + 2a. The AhetNOEs are mapped onto the 
apo-YdbC structure in tube representation with the same colouring scheme. 
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Figure 4. Solution NMR structure of YdbCdT^G! complex. (A) 
Cartoon stereoview with labelled f34 dimer interface element and 
structured ssDNA segments and their termini. (B and C) Top and 
side view of complex with labelled and coloured dT bases (T1-T7). 
For visual clarity, one side has been greyed out. (D) Detailed view of 
each dT:protein interaction sites for dTl-dT7. Residues showing 
hydrophobic interactions <5A have been included. Dashed lines rep- 
resent H-bond interactions within typical range (2.7— 3.1 A). Base-base 
stacking between dT4 and dT5 was found; protein aromatic to base 
stacking was present between Trp23 and dT2 and Trp32 and dT5. 



root-mean-square deviation because of differences in the 
secondary and tertiary structures of the termini. 

EF_3132 of Enterococcus faecalis from the same 
DUF2128 family exhibits an even more dramatic relax- 
ation behaviour, as the 'H- 15 N HSQC spectrum is 
broadened beyond detection and becomes observable 
only in the presence of dT 19 Gi (Supplementary 
Figure SI 5), indicating binding causes a change in the 
conformational exchange properties. 

ssDNA binding properties of YdbC 

To assess the affinity and sequence specificity of YdbC for 
ssDNA, the energetics of the DNA-binding interaction 
between YdbC and selected 20mer single-stranded oligo- 
nucleotides were determined using ITC (Figure 5 and 
Table 2). The primary binding event for each interaction 



studied has a stoichiometry (AO of two YdbC to one oligo- 
nucleotide, indicating that YdbC binds to ssDNA as a 
dimer, as expected from the high-association affinity of 
YdbC subunits. Additional low-affinity interactions 
occur when dT 19 Gi and dC 2 o are used (K D > 1 iiM). The 
presence of secondary interactions is evident in the 
integrated plots for dT[ 9 Gi and dC2o as non-linear 
portions in the [YdbC]/[ssDNA] >2 region of the curve. 
The secondary interactions between YdbC and both 
dTi 9 Gi and dC 2 o show a high degree of uncertainty and 
salt concentration dependence. The interactions are 
eliminated by increasing the NaCl concentration to 
300 mM (Supplementary Figure SI 6), indicating that 
these weak interactions are non-specific and 
electrostatically driven and might not be physiologically 
relevant for the function of YdbC. The primary inter- 
actions between YdbC and dTi 9 Gi, dC 2 o and d(A-C) 10 
oligonucleotides each have dissociation constants (K D ) 
within a ~4-fold range, from 11 to 39 nM, under physio- 
logically relevant conditions (pH 7.5, 150mM of NaCl). In 
contrast, the affinity of YdbC for dA 20 {K D = 1 1 uM) is 
markedly less than that observed for the other oligo- 
nucleotides. Although indicative of reduced specificity 
for polypurine sequences, low affinities and unfavourable 
enthalpic contributions to binding for poly-A sequences 
are common features of non-specific ssDNA-binding 
proteins because of the coupled energetic cost of de- 
stacking adjacent adenine residues on protein binding 
(43,44). The similar affinity of YdbC for the alternating 
purine-pyrimidine sequence d(A-C)i u to the pyrimidine 
rich sequences, dT^Gj and dC 2 o, provides further 
evidence that the lack of affinity of YdbC for dA 20 is 
mechanistic in nature and does not reflect the presence 
of sequence-specific contacts in the YdbC:ssDNA 
complex. 

Binding of YdbC to dsDNA and ssRNA 

PC4 has the capacity to disrupt duplex DNA at low ionic 
strength and micromolar protein concentrations (11). 
Analogously, we found that YdbC can disrupt a 26-base 
DNA duplex with 5'-GGATTTGGTTTCAAAAAGAAA 
AAAGG-3'sequence (and complementary) and bind to 
the resulting ssDNA while retaining the same overall 
structure to that of the YdbC:dT 19 Gi complex 
(Supplementary Figure S17). At 0.3 mM of YdbC and 
100 mM ionic strength a 35kDa YdbCdsDNA complex 
consistent with the combined masses is formed that shows 
nearly identical HSQC amide chemical shifts compared 
with the YdbC:dT 19 Gi complex. In addition, despite the 
different DNA sequence, the key Trp-base stacking inter- 
actions seem to be re-capitulated based on the position of 
the Trp23, Trp32 and Trp44 side-chain si amides. These 
are markedly distinct from the positions in the apo-YdbC 
spectrum (Supplementary Figure SI 7). These spectral 
features are consistent with a model in which the 
dsDNA structure has been disrupted to form a 
YdbC:ssDNA-type complex. 

Given the overall fold similarity of YdbC to PUR-a 
(Figure 2) and to establish their function relationships 
more clearly, we examined the binding of YdbC to 
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ssRNA. YdbC binding to an ssRNA, with sequence 
AGACAGCAUUAUGGUGUCUUU, was studied by 
analytical gel filtration and titrations monitored by 
'H- 15 N HSQC (Supplementary Figure SI 8). 
Interestingly, we found that YdbC binds ssRNA with 
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Figure 5. ssDNA-binding profiles for YdbC at 25° C and 150mM of 
NaCl. (Top) Thermal power versus time with legend added for clarity. 
ITC thermograms for the injection of 220 uM YdbC into 8-uM solu- 
tions of d(AC)io (green), dA 2 o (blue), dT^Gj (black) and dC 2 o (red). 
Each heat burst curve corresponds to the injection of 2 ul of a solution 
of YdbC into a solution of the ssDNA oligo. (Bottom) Injection heat 
versus YdbC/ssDNA ratio. The thermograms in the top panel were 
integrated to create the binding isotherms with the same colour-coding 
as in the top panel. The binding isotherms were fit (solid lines) with 
models for one [d(AC) ]0 and dA 2 o] or two (dTi 9 Gi and dC2o) sets of 
binding sites. Top and bottom panels use identical colour-coding. 



low to moderate affinity. The complex can be isolated 
by gel filtration chromatography at ~0.3mM of 
YdbCssRNA concentration. The CS perturbations 
mapped onto the structure point to a similar binding 
region for both ssRNA and ssDNA. The linear trajectory 
change in 'H-^N chemical shifts versus ssRNA:protein 
ratio indicates a two-state fast exchange binding model 
(45). A two parameters equation was used to fit the data 
and derive a value of K D ~70 uM. The authors thank an- 
onymous reviewers for suggesting detailed characteriza- 
tion of dsDNA and ssRNA binding to YdbC. 

Taxonomic distribution and sequence analysis 

A search of sequenced genomes was conducted with the 
current (May 2012) NCBI database (35), to assess the 
extent of the taxonomic distribution of homologues of 
L. lactis YdbC, within the DUF2128 (PF09901) family. 
The genomes of 1831 bacterial, 101 archaeal and 181 eu- 
karyotic species were searched using YdbC. Homologues 
were found in prokaryotic strains of the phyla Firmicutes 
(226 among bacilli, Clostridia and others), spirochaetes 
(8 strains), Tenericutes (7 strains) and fusobacteria 
(5 strains). Four members of the archaeal genus 
Methanococcus also possess a homologue of YdbC. No 
related sequences were found in other prokaryotic phyla 
or in the eukaryotic genomes searched. Details of the 
search results are provided in Supplementary Table S3. 
Interestingly, the prokaryotic species encoding YdbC 
homologues also possess the homologue of SSB 
(GenBank 37999773), suggesting that YdbC plays a com- 
plementary role to that of SSB in these species. In 
addition, PC4 and PUR-a, two proteins known to bind 
ssDNA, are structurally similar to YdbC. Both PC4 and 
PUR-a are found in eukaryotes and in bacteria, but 
absent in archaea. Initial BLASTP searches of PC4 homo- 
logues in bacteria returned no significant results; there- 
fore, we conducted BLAST-PSI and protein domain 
searches using the conserved Domain Architecture 
Retrieval Tool at NCBI (46) and Pfam 26.0. 
Twenty-four PC4 sequences were found in bacteria, 
mostly in proteobacteria (10 sequences) and spirochaetes 
(10 sequences). In addition, the PC4 sequence of the 
Firmicute Acetivibrio cellulolyticus was only found in 
Pfam. The PC4 domain occurs as a single unit or as part 
of multidomain proteins, where it can be present in 



Table 2. ITC-derived parameters for the binding of YdbC to selected 20mer oligonucleotides 



Oligonucleotide 


Binding site 


K D (M) 


AG (kcal/mol) 


AH (kcal/mol) 


AS (cal/mol»K) 


n 


dT 19 G, 


1 


(1.6 ± 0.6) x 10~ 8 


-10.6 ± 0.3 


-10.1 ± 0.1 


1.8 ± 1.3 


2.2 ± 0.1 




2 


(7.7 ± 3.0) x 10~ 6 


-7.0 ± 0.3 


-4.3 ± 0.6 


9.0 ± 3.0 


2 fixed 


dC 20 


1 


(1.1 ± 0.6) x 10~ 8 


-10.8 ± 0.4 


-7.8 ± 0.1 


10.2 ± 1.7 


2.2 ± 0.1 




2 


(1.7 ± 0.9) x 10~ 6 


-7.9 ± 0.4 


-1.6 ± 0.2 


-21.0 ± 2.0 


2 fixed 


dA 20 


1 


(1.1 ± 0.6) x 10~ 5 


-6.8 ± 0.5 


-1.8 ± 0.6 


16.6 ± 3.6 


2.1 ± 0.5 


d(A-C) 10 


1 


(3.9 ± 0.5) x 10~ 8 


-10.1 ± 0.1 


-10.3 ± 0.1 


-0.6 ± 0.6 


2.0 ± 0.1 



The ITC profiles shown in Figure 5 were fit with models for either one [dA 20 and d(A— C) 10 ] or two (dT^Gj and dC 20 ) independent sets of binding 
sites. All parameters were allowed to float during the fitting routines except for values of n for site 2 in dT 19 Gi and dC 20 , which were manually varied 
to yield the best fit (as reflected by minimization of x 2 )- The indicated uncertainties in the fitted values reflect the standard deviation of the 
experimental data from the fitted curves. Values for AG and AS were calculated using the standard formalisms containing the maximum errors 
as carried through the equations. 
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tandem repeats. All the bacterial sequences are 
single-domain proteins containing only the PC4 domain. 
The distribution of putative PUR-a homologues in bac- 
teria is also limited to few phyla, namely, in Bacteroidetes 
and spirochaetes. To better understand the relationships 
between the DUF2128, PC4 and PUR-a proteins families, 
putative YdbC homologues from representative strains 
were analysed together with sequences from PC4 and 
PUR-a families of ssDNA-binding proteins. Although 
these three families are structurally similar, they differ at 
the level of amino acid sequence, and accordingly they 
form three distinct clusters (Figure 6). However, the 
DUF2128 and PC4 clusters seem to be more closely 
related to each other than to the PUR-a clade. Within 
the PUR-a and the PC4 clusters, eukaryotic and bacterial 
sequences branch separately. Furthermore, the bacterial 
PC4 homologues constitute a loose group, with the 
A. cellulolyticus PC4 sequence forming a deep branch 
with sequences of the DUF2128 family. 

To further clarify the function of YdbC, the genomic 
context of YdbC homologues was examined in the micro- 
bial chromosomes. This analysis was carried out with the 
YdbC amino acid sequence to search the database of 
Protein Clusters at NCBI, followed by retrieval of 
genomic neighbourhoods using the ProtMap function. 
The results show that the genome context of all the 
YdbC homologues differs, suggesting that YdbC is 
encoded by a monocistronic transcript. This observation 
is also consistent with the presence of the putative riboso- 
mal binding site AGAAAGGA (47) located six nucleo- 
tides upstream from the start codon of the ydbC gene, 
and the fact that the gene downstream is transcribed in 
the opposite direction with respect to ydbC. A similar 
analysis of the genome context was also performed using 
PC4, SSB and PUR-a protein sequences. Similar to what 
is observed for YdbC, the genome context of PUR-a 
homologues differs among strains, suggesting that the 
bacterial PUR-a is not part of an operon. In the 
genomes of all Firmicutes, SSB is consistently encoded 
between two ribosomal proteins, but this arrangement is 
not maintained in other phyla and might not have func- 
tional meaning. The genome context for PC4 also varies 
within strains. One interesting observation is that in some 
Burkholderia and Leptospira strains, the sequences imme- 
diately upstream from the PC4 gene are phage-related 
integrases or transposases, raising the question whether 
these sequences might have been acquired by lateral 
gene transfer. 



DISCUSSION 

L. lactis YdbC representative of the DUF2128 family is a 
remarkably versatile nucleic acid-binding domain that 
binds ssDNA with sufficient strength to disrupt DNA 
duplex and also ssRNA, albeit more weakly. 
Remarkable structure-function similarity was found 
between L. lactis YdbC, the H. sapiens PC4 and the of 
B. burgdorferi PUR-a domains at low sequence similarity. 
PC4 is a well-characterized ssDNA-binding domain, 
whereas PUR-a is known to bind both ssDNA and RNA. 



Short amino acid stretches (see Asp40-Ile41-Arg42 and 
Lys53-Gly54-Ile55-Thr56 in the sequence alignment) of 
YdbC and PC4 are identical (Figure 2A) and highly 
conserved within the DUF2128 and PC4 family, 
indicating a possible evolutionary link (see later in the 
text). The YdbC/PUR-a relationship is much more 
remote, although Ile41, Ile55 and Glu60 are strictly 
conserved among all three proteins, and Ile41 is also 
strongly conserved within each individual family, which 
may be incidental or may point to a fold stability role of 
Ile41. The conserved residue locations along key elements 
of the secondary structure involved in nucleotide binding 
underscores the importance of the residue type at these 
specific locations for proper functioning of the domain. 
Particularly, residues Lys38, Lys50 and Lys53 have 
critical functions to create the positively charged 
solvent-exposed surface required for interactions with 
ssDNA and ssRNA. 

The L. lactis YdbC dimer binds ssDNA with nanomolar 
affinity at physiological conditions and non-specifically 
with no measurable bias for pyrimidine and mixed 
purine/pyrimidine oligonucleotides by ITC (48) (Figure 5 
and Table 2). Although complete temperature-dependent 
characterizations were not performed, the binding ener- 
getics for the YdbC interactions with pyrimidine and 
mixed purine/pyrimidine oligonucleotides seem to be con- 
sistent with those obtained for other non-specific 
ssDNA-binding proteins (43,44). These protein-ssDNA 
interactions are largely enthalpically driven and have 
large negative-binding heat capacities (AC P ) likely 
because of induced conformational changes in the bound 
oligonucleotides and unrelated to binding specificity. In 
ssDNA binding proteins, the lack of base preference for 
particular sites on the protein can produce chain trans- 
location and weakening of the ssDNA electron density 
in diffraction data (7,12). The dT 19 Gi terminal guanine 
is known to promote uniform crystallization by slowing/ 
preventing chain sliding and was originally sourced for use 
in crystallization trials in this study (7). Here, the strategy 
fails to provide adequate YdbCdTjgG! crystals for X-ray 
diffraction. 

Topologically, the binding mode of dT 19 Gi to YdbC is 
similar to that reported for PC4 (7) and covers the entire 
positively charged (top) face of the protein (Figure 1C and 
4A). As no attempt was made at enforcing similar dihedral 
angle, slight differences were found in the ssDNA 
backbone, sugar and exocyclic angle in the YdbC and 
PC4 complexes. In either case, the conformation is 
dominated by the common anti base orientation and 
C 2 ',C 3 '-endo puckering (Supplementary Table SI and 
S2). The Cl'-exo conformation for the T3 nucleotide in- 
dicates dynamics of the sugar ring at that site. Strong 
symmetric proteimssDNA contacts extend along the top 
centre P-ridge (positively charged surface) from the pi-p2 
loop to the (33 P4 loop a total of seven bases on each side 
of the dT hairpin contact the symmetric YdbC protomer 
(Figure 4B and C and Supplementary Figure SI 4). The 
N-terminal Lys4-Leu5-Lys6 participates in complex for- 
mation and become ordered on binding. Four of seven 
nucleotides form base-aromatic stacking interactions 
with the protein. Bases at T4 and T5 positions are 
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Figure 6. Neighbour-joining tree of YdbC homologues compared with sequences within the PC4 and Pur-oc families. Sequence accession 
(GenBank ID) numbers are in parenthesis. Sequences and DUF2128 in bold to highlight significance to this study. The PC4 homologue of 
Desulfobacca acetoxidans was used as the outgroup. Bootstrap values >50 are shown. Bar indicates 0.1 substitutions per amino acid position. 



stacked and buried in the centre of the protein concave (3 
face. The Asp40-Ile41-Arg42 site of conservation between 
YdbC and PC4 forms key hydrogen-bond interactions to 
the T5 pyrimidine ring. The T3 position is the most 



solvent exposed showing only interactions with Lys50 
(Figure 4D). There is no evidence that higher order oligo- 
mers are formed in the presence of ssDNA. Although 
binding ssDNA in a manner analogous to the PC4 
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structure (7), YdbC forms more extensive contacts with 
ssDNA, and its interactions are dominated by aromatic 
stacking. Analogous to PC4, YdbC is capable of disrupt- 
ing duplex DNA and binding to the resulting open 
strands (Supplementary Figure S17) (11). Here, we 
provide NMR evidence that the overall fold of YdbC in 
the YdbCdsDNA versus YdbCssDNA complex is 
preserved while the protein sequesters the open strands. 

The binding of YdbC to ssRNA is weaker (in the 
100 uM range) for a mixed purine/pyrimidine 21-nt oligo- 
nucleotide. Similar YdbC binding epitopes for ssRNA 
versus ssDNA were deduced by CS perturbation 
mapping (Supplementary Figure SI 8). Although the 
PUR-a interaction with nucleic acids has not been struc- 
turally characterized, its similarity to the well-studied 
Whirly proteins in plants suggests completely different 
binding modes (49) to those of YdbC/PC4. 

The findings reported herein for YdbC are likely to 
characterize the entire DUF2128 domain family. 
Analysis shows that ssDNA binding is occurring for 
Enterococcus faecalis EF_3132, another member of the 
DUF2128 protein family (Supplementary Figure S15). 
An important question arises with domains that are struc- 
turally and functionally similar, but whose sequence 
identity is <15%: do they/should they be grouped under 
the same superfamily, or differences are sufficient to claim 
the discovery of a novel ssDNA binding domain? Here, 
we show that YdbC and PC4 share strongly conserved 
short-sequence motifs that are clearly poised to impact 
the function. Structure-based sequence alignment is 
proven a useful starting point for bioinformatics charac- 
terization with sequence similarity that would normally be 
too low for meaningful examination. The sequence 
analysis built around structurally aligned sequences, 
shows that YdbC (DUF2128), PC4 and PUR families 
cluster in distinct regions of the sequence space 
(Figure 6). However, both DUF2128 and PC4 seem 
closer to each other than the PUR domain. The phylogen- 
etic distribution of PC4 and PUR domains extends to 
both the prokaryotic and eukaryotic domains, although 
it seems to be restricted to only few well-defined prokary- 
otic phyla in both cases, whereas DUF2128 has so far only 
been identified in prokaryotes, primarily in Firmicutes. In 
addition, the PC4 sequence of A. cellulolyticus that form a 
branch with the DUF2128 cluster suggests that DUF2128 
and PC4 are distant members of the same superfamily. 
Our findings were communicated to the Pfam group that 
independently validated our results. In the upcoming 
database release (Pfam 27.0), the DUF2128 (PF09901) 
will be merged with the PC4 (PF02229) family. The 
genome context of the genes encoding YbdC, PC4 and 
Pur-a is consistent with these genes being expressed as 
monocistronic transcription units. For YbdC, the finding 
is also supported by the presence of a ribosomal-binding 
site upstream of the translation start site, and a gene 
encoded in opposite orientation downstream of YdbC. 

E. coli transformed to contain the human PC4 gene 
have shown enhanced protection from oxidative damage 
(50). It is conceivable that YdbC could have similar or 
general DNA repair functions in L. lactis and other pro- 
karyotic members of the DUF2128 family. The biological 



implications of the newly uncovered YdbC ability to bind 
to ssRNA require further study but may be unique to the 
prokaryotic branch in the context of this new PC4 
superfamily. 

In summary, the structural, thermodynamic and bio- 
informatics analyses presented here demonstrate that 
YdbC, and indeed most members of the prokaryotic 
DUF2128 domain family, is a multifunctional nucleic 
acid-binding domain with high affinity for ssDNA. 
Given the industrial and biomedical applications of this 
microorganism, further functional characterization of 
YdbC should be of general interest. 
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